class: center, middle, inverse, title-slide # Introduction to R for Data Analysis ## Data Visualization - Part 2 ### Johannes Breuer & Stefan Jünger ### 2021-08-05 --- layout: true --- ## Content of this session .pull-left[ **What we will do** - Visual checks of model assumptions - Plotting coefficients - Plotting predictions - Plotting multiple models ] .pull-right[ **What we won't do** - Again, no crazy models - No bayesian statistics ] --- ## Packages in this sessions .pull-left[ We will again rely on packages from the `easystats` collection - `paramters` - `performance` - **`see`** ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\logo_wall.png" width="1600" style="display: block; margin: auto;" /> ] .pull-left[ While it's straightforward to use, it still has some limitations. From time to time we will therefore also switch to the `sjPlot` package ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\sjplot_logo.png" width="160" style="display: block; margin: auto;" /> ] --- ## Estimating a linear model (OLS) ```r linear_model <- lm( mean_trust ~ age_cat + sex + education_cat + pol_leaning_cat, data = gp_covid ) library(parameters) model_parameters(linear_model) ``` ``` ## Parameter | Coefficient | SE | 95% CI | t(3106) | p ## ------------------------------------------------------------------------------------ ## (Intercept) | 3.52 | 0.08 | [ 3.37, 3.67] | 46.55 | < .001 ## age_cat | 0.03 | 5.25e-03 | [ 0.02, 0.04] | 6.13 | < .001 ## sex | 0.09 | 0.02 | [ 0.05, 0.14] | 3.87 | < .001 ## education_cat | 0.02 | 0.02 | [-0.02, 0.05] | 0.89 | 0.374 ## pol_leaning_cat [left] | 0.05 | 0.03 | [-0.01, 0.10] | 1.72 | 0.085 ## pol_leaning_cat [right] | -0.28 | 0.05 | [-0.38, -0.17] | -5.21 | < .001 ``` --- ## Quick inspection using `base R` .pull-left[ ```r par(mfrow = c(2, 2)) plot(linear_model) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- ## Using `easystats` .pull-left[ ```r library(performance) library(see) check_normality(linear_model) %>% plot() ``` ] .pull-right[ ``` ## Warning: Non-normality of residuals detected (p < .001). ``` <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] --- ## QQ-Plot (`easystats`) .pull-left[ ```r check_normality(linear_model) %>% plot(type = "qq") ``` ] .pull-right[ ``` ## Warning: Non-normality of residuals detected (p < .001). ``` <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- ## Heteroscedasticity (`easystats`) .pull-left[ ```r check_heteroscedasticity(linear_model) %>% plot() ``` ] .pull-right[ ``` ## Warning: Heteroscedasticity (non-constant error variance) detected (p < .001). ``` <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- ## Check outliers (`easystats`) .pull-left[ ```r check_outliers(linear_model) %>% plot() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- ## Coefficient plot (`easystats`) .pull-left[ ```r library(parameters) model_parameters(linear_model) %>% plot() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- ## Changing labels: it's `ggplot2`-based .pull-left[ ```r library(ggplot2) model_parameters(linear_model) %>% plot() + scale_x_discrete( labels = # in reversed order! c( "Pol. Right (Ref. Center)", "Pol. Left (Ref. Center)", "Education", "Sex", "Age" ) ) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- ## Making it more dim and scientific .pull-left[ ```r library(ggplot2) model_parameters(linear_model) %>% plot() + scale_x_discrete( labels = # in reversed order! c( "Pol. Right (Ref. Center)", "Pol. Left (Ref. Center)", "Education", "Sex", "Age" ) ) + scale_colour_grey(start = 0, end = 0) + guides(color = "none") ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- ## Predictions (`sjPlot`) .pull-left[ ```r library(sjPlot) linear_model %>% plot_model( type = "pred", terms = "age_cat" ) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] --- ## Again, it's `ggplot2`-based .pull-left[ ```r linear_model %>% plot_model( type = "pred", terms = "age_cat" ) + xlab("Age") + ylab("Trust") + ggtitle("Linear Model") + theme_bw() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- ## Estimating multiple models ```r gp_covid <- gp_covid %>% mutate( mean_trust_median = ifelse( mean_trust <= median(mean_trust, na.rm = TRUE), 0, 1 ) ) ``` --- ## Logistic regressions ```r logistic_model <- glm( mean_trust_median ~ age_cat + sex + education_cat + pol_leaning_cat, data = gp_covid, family = binomial(link = "logit"), ) model_parameters(logistic_model) ``` ``` ## Parameter | Log-Odds | SE | 95% CI | z | p ## --------------------------------------------------------------------------- ## (Intercept) | -1.16 | 0.23 | [-1.61, -0.72] | -5.09 | < .001 ## age_cat | 0.09 | 0.02 | [ 0.06, 0.12] | 5.81 | < .001 ## sex | 0.17 | 0.07 | [ 0.03, 0.31] | 2.32 | 0.020 ## education_cat | 0.05 | 0.06 | [-0.06, 0.16] | 0.86 | 0.391 ## pol_leaning_cat [left] | 0.09 | 0.08 | [-0.07, 0.25] | 1.15 | 0.251 ## pol_leaning_cat [right] | -0.66 | 0.17 | [-1.00, -0.33] | -3.88 | < .001 ``` --- ## Linear probability model ```r linear_probability_model <- lm( mean_trust_median ~ age_cat + sex + education_cat + pol_leaning_cat, data = gp_covid, ) model_parameters(linear_probability_model) ``` ``` ## Parameter | Coefficient | SE | 95% CI | t(3106) | p ## ------------------------------------------------------------------------------------ ## (Intercept) | 0.22 | 0.05 | [ 0.11, 0.33] | 3.96 | < .001 ## age_cat | 0.02 | 3.82e-03 | [ 0.01, 0.03] | 5.87 | < .001 ## sex | 0.04 | 0.02 | [ 0.01, 0.08] | 2.32 | 0.020 ## education_cat | 0.01 | 0.01 | [-0.01, 0.04] | 0.86 | 0.392 ## pol_leaning_cat [left] | 0.02 | 0.02 | [-0.02, 0.06] | 1.16 | 0.247 ## pol_leaning_cat [right] | -0.15 | 0.04 | [-0.23, -0.08] | -3.94 | < .001 ``` --- ## Comparing model fits (`easystats`) .pull-left[ ```r library(performance) compare_performance( linear_model, logistic_model, linear_probability_model ) %>% plot() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] --- ## Comparing model parameters (`sjPlot`) .pull-left[ ```r plot_models( linear_model, logistic_model, linear_probability_model, std.est = "std" ) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] --- ## Adjusting it within the function .pull-left[ ```r plot_models( linear_model, logistic_model, linear_probability_model, std.est = "std", m.labels = c( "Linear", "Logistic", "Linear Probability" ), axis.labels = c( "Pol. Right (Ref. Center)", "Pol. Left (Ref. Center)", "Education", "Sex", "Age" ) ) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] --- ## But it's still a `ggplot2` object .pull-left[ ```r plot_models( linear_model, logistic_model, linear_probability_model, # std.est = "std", m.labels = c( "Linear", "Logistic", "Linear Probability" ), axis.labels = c( "Pol. Right (Ref. Center)", "Pol. Left (Ref. Center)", "Education", "Sex", "Age" ) ) + theme_bw() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] --- ## Plotting predictions from multiple models ```r linear_predictions <- get_model_data( linear_model, term = "age_cat", type = "pred" ) linear_predictions ``` ``` ## # Predicted values of mean_trust ## ## age_cat | Predicted | group_col | 95% CI ## ---------------------------------------------- ## 1 | 3.73 | 1 | [3.67, 3.79] ## 2 | 3.76 | 1 | [3.71, 3.82] ## 3 | 3.80 | 1 | [3.75, 3.84] ## 4 | 3.83 | 1 | [3.79, 3.87] ## 6 | 3.89 | 1 | [3.86, 3.92] ## 7 | 3.92 | 1 | [3.89, 3.95] ## 8 | 3.96 | 1 | [3.92, 3.99] ## 10 | 4.02 | 1 | [3.97, 4.07] ## ## Adjusted for: ## * sex = 1.49 ## * education_cat = 2.48 ## * pol_leaning_cat = center ``` --- ## Predictions from the other two models .pull-left[ ```r logistic_predictions <- get_model_data( logistic_model, term = "age_cat", type = "pred" ) ``` ] .pull-right[ ```r linear_probability_predictions <- get_model_data( linear_probability_model, term = "age_cat", type = "pred" ) ``` ] --- ## Combining it in one table ```r library(dplyr) library(tibble) all_predictions <- bind_rows( linear_predictions %>% mutate(model = "Linear Model"), logistic_predictions %>% mutate(model = "Logistic Model"), linear_probability_predictions %>% mutate(model = "Linear Probability Model") ) %>% as_tibble() ``` --- ## Plotting them with `facet_wrap()` .pull-left[ ```r ggplot( all_predictions, aes(x = x, y = predicted) ) + geom_line() + geom_line(aes(y = conf.high), linetype = "dashed") + geom_line(aes(y = conf.low), linetype = "dashed") + facet_wrap(~model) + theme_bw() ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> ] --- ## Plotting only the binary models .pull-left[ ```r only_two <- ggplot( all_predictions %>% filter(model != "Linear Model"), aes(x = x, y = predicted) ) + geom_line() + geom_line(aes(y = conf.high), linetype = "dashed") + geom_line(aes(y = conf.low), linetype = "dashed") + facet_wrap(~model) + theme_bw() only_two ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] --- ## The same logic for real AME ```r library(margins) logistic_model_AME <- cplot( logistic_model, what = "prediction", draw = FALSE ) linear_probability_model_AME <- cplot( linear_probability_model, what = "prediction", draw = FALSE ) ``` --- ## Combining AME data ```r all_AME <- bind_rows( logistic_model_AME %>% mutate(model = "Logistic Model"), linear_probability_model_AME %>% mutate(model = "Linear Probability Model") ) %>% as_tibble() ``` --- ## Plotting them again .pull-left[ ```r only_two_AME <- ggplot( all_AME, aes(x = xvals, y = yvals) ) + geom_line() + geom_line(aes(y = upper), linetype = "dashed") + geom_line(aes(y = lower), linetype = "dashed") + facet_wrap(~model) + theme_bw() only_two_AME ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] --- ## Combining them with previous predictions .pull-left[ ```r library(patchwork) (only_two + ggtitle("Simple Predictions") + ylab("Predicted Trust Value") + xlab("Age")) / (only_two_AME + ggtitle("AME Predictions") + ylab("Predicted Trust Value") + xlab("Age")) ``` ] .pull-right[ <img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ]